10 research outputs found
Recovering joint and individual components in facial data
A set of images depicting faces with different expressions or in various ages consists of components that are shared across all images (i.e., joint components) and imparts to the depicted object the properties of human faces and individual components that are related to different expressions or age groups. Discovering the common (joint) and individual components in facial images is crucial for applications such as facial expression transfer. The problem is rather challenging when dealing with images captured in unconstrained conditions and thus are possibly contaminated by sparse non-Gaussian errors of large magnitude (i.e., sparse gross errors) and contain missing data. In this paper, we investigate the use of a method recently introduced in statistics, the so-called Joint and Individual Variance Explained (JIVE) method, for the robust recovery of joint and individual components in visual facial data consisting of an arbitrary number of views. Since, the JIVE is not robust to sparse gross errors, we propose alternatives, which are 1) robust to sparse gross, non-Gaussian noise, 2) able to automatically find the individual components rank, and 3) can handle missing data. We demonstrate the effectiveness of the proposed methods to several computer vision applications, namely facial expression synthesis and 2D and 3D face age progression in-the-wild
Free-HeadGAN: Neural Talking Head Synthesis with Explicit Gaze Control
We present Free-HeadGAN, a person-generic neural talking head synthesis
system. We show that modeling faces with sparse 3D facial landmarks are
sufficient for achieving state-of-the-art generative performance, without
relying on strong statistical priors of the face, such as 3D Morphable Models.
Apart from 3D pose and facial expressions, our method is capable of fully
transferring the eye gaze, from a driving actor to a source identity. Our
complete pipeline consists of three components: a canonical 3D key-point
estimator that regresses 3D pose and expression-related deformations, a gaze
estimation network and a generator that is built upon the architecture of
HeadGAN. We further experiment with an extension of our generator to
accommodate few-shot learning using an attention mechanism, in case more than
one source images are available. Compared to the latest models for reenactment
and motion transfer, our system achieves higher photo-realism combined with
superior identity preservation, while offering explicit gaze control
Generalizing Gaze Estimation with Weak-Supervision from Synthetic Views
Developing gaze estimation models that generalize well to unseen domains and
in-the-wild conditions remains a challenge with no known best solution. This is
mostly due to the difficulty of acquiring ground truth data that cover the
distribution of possible faces, head poses and environmental conditions that
exist in the real world. In this work, we propose to train general gaze
estimation models based on 3D geometry-aware gaze pseudo-annotations which we
extract from arbitrary unlabelled face images, which are abundantly available
in the internet. Additionally, we leverage the observation that head, body and
hand pose estimation benefit from revising them as dense 3D coordinate
prediction, and similarly express gaze estimation as regression of dense 3D eye
meshes. We overcome the absence of compatible ground truth by fitting rigid 3D
eyeballs on existing gaze datasets and design a multi-view supervision
framework to balance the effect of pseudo-labels during training. We test our
method in the task of gaze generalization, in which we demonstrate improvement
of up to compared to state-of-the-art when no ground truth data are
available, and up to when they are. The project material will become
available for research purposes.Comment: 13 pages, 12 figure
Deep neural network augmentation: generating faces for affect analysis
This paper presents a novel approach for synthesizing facial affect; either in terms of the six basic expressions (i.e., anger, disgust, fear, joy, sadness and surprise), or in terms of valence (i.e., how positive or negative is an emotion) and arousal (i.e., power of the emotion activation). The proposed approach accepts the following inputs:(i) a neutral 2D image of a person; (ii) a basic facial expression or a pair of valence-arousal (VA) emotional state descriptors to be generated, or a path of affect in the 2D VA space to be generated as an image sequence. In order to synthesize affect in terms of VA, for this person, 600,000 frames from the 4DFAB database were annotated. The affect synthesis is implemented by fitting a 3D Morphable Model on the neutral image, then deforming the reconstructed face and adding the inputted affect, and blending the new face with the given affect into the original image. Qualitative experiments illustrate the generation of realistic images, when the neutral image is sampled from fifteen well known lab-controlled or in-the-wild databases, including Aff-Wild, AffectNet, RAF-DB; comparisons with generative adversarial networks (GANs) show the higher quality achieved by the proposed approach. Then, quantitative experiments are conducted, in which the synthesized images are used for data augmentation in training deep neural networks to perform affect recognition over all databases; greatly improved performances are achieved when compared with state-of-the-art methods, as well as with GAN-based data augmentation, in all cases
Multi-Attribute Robust Component Analysis for Facial UV Maps
The collection of large-scale three-dimensional (3-D) face models has led to significant progress in the field of 3-D face alignment “in-the-wild,” with several methods being proposed toward establishing sparse or dense 3-D correspondences between a given 2-D facial image and a 3-D face model. Utilizing 3-D face alignment improves 2-D face alignment in many ways, such as alleviating issues with artifacts and warping effects in texture images. However, the utilization of 3-D face models introduces a new set of challenges for researchers. Since facial images are commonly captured in arbitrary recording conditions, a considerable amount of missing information and gross outliers is observed (e.g., due to self-occlusion, subjects wearing eye-glasses, and so on). To this end, in this paper we propose the Multi-Attribute Robust Component Analysis (MA-RCA), a novel technique that is suitable for facial UV maps containing a considerable amount of missing information and outliers, while additionally, elegantly incorporates knowledge from various available attributes, such as age and identity. We evaluate the proposed method on problems such as UV denoising, UV completion, facial expression synthesis, and age progression, where MA-RCA outperforms compared techniques
Towards a complete 3D morphable model of the human head
Three-dimensional Morphable Models (3DMMs) are powerful statistical tools for
representing the 3D shapes and textures of an object class. Here we present the
most complete 3DMM of the human head to date that includes face, cranium, ears,
eyes, teeth and tongue. To achieve this, we propose two methods for combining
existing 3DMMs of different overlapping head parts: i. use a regressor to
complete missing parts of one model using the other, ii. use the Gaussian
Process framework to blend covariance matrices from multiple models. Thus we
build a new combined face-and-head shape model that blends the variability and
facial detail of an existing face model (the LSFM) with the full head modelling
capability of an existing head model (the LYHM). Then we construct and fuse a
highly-detailed ear model to extend the variation of the ear shape. Eye and eye
region models are incorporated into the head model, along with basic models of
the teeth, tongue and inner mouth cavity. The new model achieves
state-of-the-art performance. We use our model to reconstruct full head
representations from single, unconstrained images allowing us to parameterize
craniofacial shape and texture, along with the ear shape, eye gaze and eye
color.Comment: 18 pages, 18 figures, submitted to Transactions on Pattern Analysis
and Machine Intelligence (TPAMI) on the 9th of October as an extension paper
of the original oral CVPR paper : arXiv:1903.0378
Machine learning methods for facial attribute editing
In this thesis, we explore the problem of facial attribute analysis and editing in images, from the scope of two major fields of Machine Learning, those of Components Analysis (CA) and Deep Learning (DL).
First, we present a CA method for analysing and editing facial data.
Then, we present a DL algorithm for animating facial images according to expressions and speech.
Finally, we present a method for improving gaze estimation generalisation to unseen image domains and showcase applications to eye gaze editing.
Although CA methods are able to capture only linear relationships in data, they can still be useful with well-aligned data, such as UV maps of facial texture.
In this Thesis, we propose robust extensions of the Joint and Individual Variance Explained (JIVE) method, for the recovery of joint and individual components in visual facial data, captured in unconstrained conditions and possibly containing sparse non-Gaussian errors and missing data.
We demonstrate the effectiveness of the proposed methods to several computer vision applications, namely facial expression synthesis and 2D and 3D face age progression in-the-wild.
CA methods usually fall short in image generation, as they fail to generate details.
On the contrary, Image-to-image (i2i) translation, which is the problem of translating images between image domains, has recently seen remarkable progress since the advent of DL and Generative Adversarial Networks (GANs).
In this Thesis, we study the problem of i2i translation, under a set of continuous parameters that correspond to statistical blendshape models of facial motion.
We show that it is possible to edit facial images according to expression and speech blendshapes using ``sliders'', which are more flexible than discrete expressions or action units.
Lastly, realistically animating gaze is crucial for achieving high quality facial animations.
To this end, large datasets of faces with gaze annotations are required for training.
In this Thesis, we present a weakly-supervised method for improving gaze estimation generalization to unseen domains, by harnessing arbitrary unlabelled ``in-the-wild'' face images.
Unlike previous methods, we tackle gaze estimation as end-to-end, dense 3D reconstruction of eyes and experimentally validate the benefits of this choice.
Particularly, we show improvements in semi-supervised and cross-dataset gaze estimation.
Finally, we showcase how our methods can be employed for training efficient models for gaze editing.Open Acces
The 3D menpo facial landmark tracking challenge
Abstract
Recently, deformable face alignment is synonymous to the task of locating a set of 2D sparse landmarks in intensity images. Currently, discriminatively trained Deep Convolutional Neural Networks (DCNNs) are the state-of-the-art in the task of face alignment. DCNNs exploit large amount of high quality annotations that emerged the last few years. Nevertheless, the provided 2D annotations rarely capture the 3D structure of the face (this is especially evident in the facial boundary). That is, the annotations neither provide an estimate of the depth nor correspond to the 2D projections of the 3D facial structure. This paper summarises our efforts to develop (a) a very large database suitable to be used to train 3D face alignment algorithms in images captured “in-the-wild” and (b) to train and evaluate new methods for 3D face landmark tracking. Finally, we report the results of the first challenge in 3D face tracking “in-the-wild”